paradigms: ---------- - supervised learning: Input to learning is a LABELED set of data, i.e. examples showing how the computer should behave. E.g.: computer gets a set of images along with their text descriptions, and should learn to write descriptions for new images. - unsupervised learning: Input to learning is UNLABELED set of data and the computer is to find patterns in this data. This includes clustering (automatically creating groups of similar items), autoencoders (automatically creating a compressing algorithm), finding associations between data items, derecting anomalies etc. - reinforcement learning: The machine receives feedback on how well it is doing from a function and tries to maximize its performance. E.g.: computer is trying to design an algorithm to solve problem X and gets feedback from a function that returns the speed of solving the problem, making the machine look for the fastest algorithm for X. - overfitting: Bad thing that happens when you use too complex learning models and/or train it too much. This will fit the model exactly to training data but won't generalize well to any other data. Neural Networks (NN) ==================== Learning models inpired by biological neural networks. - neuron: Model of neuron. One type is perceptron, which is a linear classifier. Perceptron has one output and N inputs, and is parametrized by N weights w1...1N (each for each input) and a bias B, which define a hyperplane in N dimensional space that divides it into two halves. E.g. given 2 inputs we can write an equation input1 * w1 + input2 * w2 + B > 0 which divides a 2D space by a line into a part in where the neuron is active and part where it is not active. Neuron has an activation function after it to produce the final output. - activation function: Function that takes an output of a neuron and puts it into <0,1> interval, usually a logistic function f(x) = 1 / (1 + e^(-1 * sharpness * x)) ______ 1 .-'' / 0 ____..-' - neural network: A set of interconnected neurons, in practice organized to layers so that each neuron (except from 1st layer) has inputs from all neurons from previous layer and outputs to all neurons in the next layer (except for last layer). Number of layers and neurons in the is a lot about experimentation. - backpropagation: Algorithm for training a NN in suprvised learning (generalization exist), implementing gradient descent (looking for local minimum in parameter space). Convolutional Neural Networks (CNN) =================================== For processing images, also "shift invariant", similar to human vision system. The input is an image of size W x H and depth D (e.g. 32 x 32 pixels with depth 3 for RGB). In the network there are these types of layers: - convolutional layers (CL): The neurons that represent close pixels in the image represented by the input layer go to a single neuron in this layer, which effectively achieves convolution. The layer actually performs N convolutions (which are actually cross correlations, without transposing the kernel) of input image of depth D0 with predefined N convolutional kernels (there can and should be several) of size Wn x Hn x D0 (e.g. 5 x 5 pixels x 3 for RGB depth). I.e. the image on the input with size W0 x H0 x D0 will be convolved to N images, each of size (Wn - Wk + 1) x (Hn - Hk + 1) x D0. Example: input image is 32 x 32 x 3, the convolutional layer has 4 kernels, each of size 5 x 5 x 3. The output of the layer will be an image 28 x 28 x 4. Note the change in depth (changes to the number of kernels). Output image obtained by filtering by each kernel is an activation map (since convolution finds features in the images similar to the kernel), i.e. detection of specific features. The first layer in the network is always convolutional (other don't make sense). This one detects low-level features (lines, corners etc.). From these features each consecutive convolutional layer detects progressively higher level features (faces, cars etc.). - pooling layers (PL): Scale down the image by some factor, e.g. 2x. There are different methods to do this, the most common being max (take the maximum pixel of each area), but average or min can also be used. - fully connected layers (FC): Connect every neuron of previous layer to every neuron in this layer. These come at the end of the network after all the convolutional and pooling layers and work like a normal general neural network to create the final output. The output of the network is a C dimensional array of numbers, C being the number of classes to which we want to classify. Each element of the array says the probability of correcponding class. A typical CNN looks like this: input image -> CL -> PL -> CL -> PL -> ... -> FC -> FC -> ... -> output GENERATIVE ADVERSARIAL NETWORKS (GAN) ===================================== GANs are based on competition of two networks: 1. generative: Tries to generate new data that look like the training data. The goal of this network is to maximize the error of the generative network (i.e. trick it to make it believe its synthesized results are not synthesized). 2. discriminative: Tries to learn to distinguish between the real and synthesized data. As learning goes on, both networks get better: 1. at generating faithful data, 2. at spotting fakes, in turn forcing 1. to get better etc.